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Abstract 

For a joint probability density function fx{x) of a random vector X the 
mixed partial derivatives of \ogfx{x) can be interpreted as limiting cumu- 
lants in an infinitesimally small open neighborhood around x. Moreover, 
setting them to zero everywhere gives independence and conditional inde- 
pendence conditions. The latter conditions can be mapped, using an alge- 
braic differential duality, into monomial ideal conditions. This provides an 
isomorphism between hierarchical models and monomial ideals. It is thus 
shown that certain monomial ideals are associated with particular classes of 
hierarchical models. 

Keywords: Differential cumulants, conditional independence, hierarchical mod- 
els, monomial ideals. 



1 Introduction 

This paper draws together three areas: a new concept of differential cumulants, 
hierarchical models and the theory of monomial ideals in algebra. The central 
idea is that for a strictly positive density fx{x) of a p-dimensional random vector 
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X, the mixed partial derivative of the log density gx{x) = log fx (x) can be 
used to express independence and conditional independence statements. Thus, for 
random variables Xi,X2, X^ in M, the condition 

dx dx 3Xi,X2,Xz{xi,X2,x:i) = Oforall (a;i,a;2,X3) inM^ (1) 

is equivalent to the conditional independence statement 

Xi X X2IX3. 

In the next section we show how such mixed partial derivatives can be interpreted 
as differential cumulants. Then, in section |3l we show how collections of dif- 
ferential equations like can be used to express independence and conditional 
independence models. Section |4] shows that, more generally, these collections can 
be used to define hierarchical statistical models of exponential form. 

Section [5] maps the hierarchical model conditions to monomial ideals, which 
are increasingly being used within algebraic statistics. This isomorphism maps, 
for example, the mixed partial derivative condition (H) to the monomial ideal 
< X1X2 > within the polynomial ring k[xi, X2, X3]. The equivalence allows 
ideal properties to be interpreted as hierarchical model properties, opening up 
an algebraic-statistical interface with some potential. 



2 Local and differential cumulants 



This section can be considered as a developm ent from a body of work on local cor- 
rela tion. Good examples a re the papers of Holland & Wand (ll987h. Jonesl (119961) 
and lBairamov et al.l dlOOSb . We draw particularly on lMueller & YanI dlOOlh . 

Let X E MP he a random vector. We assume X has a p + 1 times contin- 
uously differentiable density fx- Once we introduce the concept of differential 
cumulants, we further require fx be strictly positive. 

For X, k in MP we set x'' := nr=i ^' '■— Yl^=i ^^'^ = ^{X^). Let 
Mx '■ MP — > M and Kx '■ M^ — > M denote the moment and cumulant generating 
functions of X respectively. For a vector k in N*' we set 



nil ' 

where ||A;||^ := XliLil^il is the Manhattan norm. By convention D^f{x) := f{x). 
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The cumulant can be found by e valuating D^( \o^(Mx(t)) at zero. We use 
the muMvariate chain rule (given e.g. in lHardyl 120061) stated in Theorem[T] At the 
heart of the chain rule is an identification of differential operators with multisets: 

Definition 1 (Multiset, multiplicity, size). A multiset M is a set which may hold 
multiple copies of its elements. The number of occurrences of an element is its 
multiplicity. The multiplicity of a multiset is the vector of multiplicities of its 
elements, denoted by um. The total number of elements |M| in M is the size. A 
multiset which is a set is called degenerate. 



Example 1 (Partial derivative and multiset). The partial derivative q^^q^'s 
associated multiset {1, 3, 3, 3} with multiplicity (1, 0, 3) and size four. 



f{x) has 



Definition 2 (Partition of a multiset). Let / be some index set and {Mijuni be a 
family of multisets with associated family of multiplicities (z^j\/Jiin/. A partition 
7r of a multiset M is a multiset of multisets {(Afj)jg/} such that um = Xlie/ ^^h- 
Being a multiset itself, a partition can hold multiple copies of one or more multi- 
sets. 

Example 2 (Partition of a multiset). The multiset {{xi, x^}, {xi, X3}, {xs}} is a 
partition of {xi,xi, X3, 0:3, X3}, since (1, 0, 1) + (1, 0, 1) + (0, 0, 1) = (2, 0, 3). In 
the following, we will use the shorthand {X1X3IX1X3IX3}. 

Associated with a partition tt of a multiset M is a combinatorial quantity to 
which we refer as the collapse number c(7r). It is defined as 



C TT 



See iHardyl (|2006|) for a combinatorial interpretation of c(7r) . 
Theorem 1 (Higher order derivative of chain functions). 

7r| 

D''g{h{x))= J2 c{n)i:^''\g{h)Y[D''"^h{x), 

7ren(A:) j=l 

where n(A;) is the set of all partitions of a multiset with multiplicity k and Mj is 
thej-th multiset in the partition n. 



Proof See lHardvl(l2006h 



□ 
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Corollary 1 (Cumulants as functions of moments). Let be the k-th cumulant. 
Then 

Yl c(7r)(-l)W-i)^^|_l)!f],n.„^. (2) 

7rGn(fc) j=l 

Proof. Set g{h) = \og{h), h{t) = Mxif) and evaluate at t = 0. □ 

Example 3 (Partial derivative). Consider the partial derivative -Q^^9{h{x, y, z). 
The associated multiset is {1, 3, 3} with partitions {133}, {13|3}, {1|33}, {1|3|3}. 
The multivariate chain rule tells us that 

D'''g{h{x,y,z)) = DgD''^h 

+ D'^gD^'^'^hD'^'^^h 
+ D'gD''''h{D''^'hf, 

where function arguments have been suppressed on the right hand side for better 
readability. In particular we may conclude that 

1^102 = mio2 - 2mioimooi - ^100^002 + "^ioo"^ooi- 

The expression for cumulants in terms of moments is particularly simple in 
what we shall call the square-free case, that is for cumulants k^, whose index 
vector k is binary. In that case, the multiset associated with k is degenerate and 
c{n) = 1. Equation ^ simplifies to 

7rgn(fc) j = l 

In this form it is often stated and derived via the classical Faa Di Bruno for- 
mul a applied to an exponential fun ction followed by a Moebius inversion (see 
e.g. 



Bamdorff-Nielsen & CoxLll989h . 



Local analogues to moments and cumulants can be derived as one considers 
their limitin g counterparts in the n eighborhood of a fixed point in M''', an idea 
proposed by iMueller & YanI (|200l|) . This section derives formulae for local mo- 
ments and cumulants and local moment generating functions provided its global 
counterpart exists. 
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For a strictly positive edge length e in M+, let A{a, e) := [^ — ^,^ + denote 
the hyper cube centralized at ^. Let = denote its volume. The density of the 
random variable X in W conditional on being in A is given by 



pr{X e A) 



where 1a{x) is the indicator function which returns unity if x is in A and zero 
otherwise. The conditional moments about ^ are denoted by 



^ i=i 



X eA 



Let 2N and 2N-I-1 denote the set of positive even and odd integers respectively. For 
symmetry reasons, even and odd orders of individual components have different 
effects on local moments, which motivates the following definition: 



li + j]l(ai e 2N+1). 



1=1 



II ■ II ^ increments the total sum of the components of a vector by one additional unit 
for each odd component (it is not to be interpreted as a norm). 

Theorem 2 (Local moments). Let X in MP be an absolutely continuous random 
vector with density fx which is p times differentiable in ^ in MP. Let k in W de- 
termine the order of moment. Then, for\A\ sufficiently small, X has local moment 

mt = r{e,k)(^^^^ + 0{e^)^, (3) 



where r{e,k) := Y[j^ H fc^ « •= Yl " 
Proof. Consider 



i=l, i=l. *=li 

k,e2N A:ie2N+l fcie2N+l 



JAfxix)dx 

Approximate fx through its p-th order Taylor expansion, integrate (|4]i term by 
term and exploit the point symmetry of odd order terms about the origin. □ 
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Example 4 (Local moment mi2o). Consider a tri-variate random variable X with 
local moment mfao = ^((^i - 6) (^2 - 6)^1^ e A). Then r(e, A;) = ^, a : = 
(1, 0, 0)' and we obtain 

A _ e^ df{xuX2,X3) 6 

A natural way to extend the concept of a local moment is to consider the 
limiting case when e — )• 0. This leads to our definition of differential moments. 

Definition 3 (Differential moment). The differential moment of an absolutely 
continuous random vector X in W in ^ in W is defined as: 

£ 1- 

: = lim — — r 



>o r(e, k) 



Corollary 2 (Differential moment). For a differential moment of order k in W in 
^ in MP it holds that 

Proof. This follows from Theorem |2] upon taking the limit as e — t- 0. □ 

From (|3]) it is clear that the choice of a in the derivative D°'fx depends only 
on the pattem of odd and even components of the moment. To be precise, a 
holds a unity corresponding to odd components and a zero corresponding to even 
component entries. Consequently, the differential moment m| depends on k only 
via the pattem of odd and even values. 

This suggests defining an equivalence relation on x W: For u,k E W set 
u r^rn k <^=^ = mfci...fep. The relation ~m partitions the product space 

W X into 2^ equivalence classes of same differential moments. The graph 
corresponding to ~m is depicted in Figure [Hfor the bivariate case. The axes give 
the order of the moment for the two components. Different symbols represent 
different equivalence classes. For instance, (2,2) ~m (4, 2), since = "^42- 
Note that u r^rn k <^=^ ||m - k\\^ E 2N. 

Similarly to local moments, for any measurable set A we can define a local 
moment generating function: 

M^{t) :=E{e''^\X E A). 



6 



50- 
4.5 - 



Equivalance classes of same differential moment 

— , e • 1 . e- 



^ 2^ 

"E 
O 

1.5 - 
0.5 - 



O 



o 



2 2.5 3 

Order of first component 



Figure 1: Graph of the equivalence classes induced by (bivariate case). Each 
equivalence class is depicted with a different symbol. 

Being a conditional expectation, it exists if Mx exists. We have the following 
expansion: 



Mi{t) 



1 



pr{X e A) 

1=1 



J A 



fx{x)dx 



dxi 



3/x(0 



+ 0(6^) 



d^fx{x) 



1=1 
p p 



dxi 



x=i 



6/x(0 



1=1 j>i 



x=^ 



+ 0(e^) ]+0(e^t^ 



The local moments can be computed from the local moment generating function 
via differentiation to appropriate order and evaluation at t = 0. The natural log- 
arithm of the local moment generating function defines the local cumulant gener- 
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ating function K^{t) : 



K^{t) := log(Mi(t)). 



Corollary 3 (Local cumulants). Under the conditions of Theorem^it holds for 
the local cumulants that 



7ren(A:) 



/x(0 



where a, /j' a function of the partition n and defined as 



i=i ^ 



z/M,(^) e2N + l 



?/ia? aj is binary and holds ones corresponding to odd elements of I'm j- Fur- 
thermore, 



111 



p 

n 



1 



n 



1 



Proof. Combine the chain rule and Theorem |2l 



□ 



Similarly to differential moments we can define differential cumulants at ^. 
Two different ways of doing so are natural. First, taking the limiting quantity 
of the local cumulants as e — t- or, second, taking the series of differential mo- 
ments and requiring that the mapping between moments and cumulants is pre- 
served which is induced through the ex-log relation of the associated generating 
functions, see also the discussion in (McCuUaghll 19871 . page 62). 

As demonstrated below, the two quantities just described differ in general and 
coincide only in the square-free case. In order to retain the intuitive and famil- 
iar relation between cumulants and moments, we define differential cumulants in 
terms of differential moments. 

Definition 4 (Differential cumulant). For an index vector k in W, the differential 
cumulant in a in W is defined as 



k ■- 



7ren(fc) 



TT 



^1 

1)! JJm 



1=1 
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We are now in a position to state the main result of this section, namely that 
mixed partial derivatives of the log density can be interpreted as differential cu- 
mulants. 

Lemma 1 (Differential cumulant). For a differential cumulant in ^ in W of order 
k in W it holds that 

4 = D"log(/x(e)), 

p 

where a := Cj projects odd elements ofk onto one and even elements ofk 

i=l, 

onto zero. 

Proof. Apply the chain rule to D° log(/x(0)- ^ 
This i s a multivariate g e neral ization of the local dependence function intro- 



duced by [Holland & WangI (Il987i) . The next theorem relates differential cumu- 



lants to the limit of local cumulants. 



Theorem 3 (Differential and limiting local cumulant). A differential cumulant k]. 

^^^0 r(e,fc) '^k 



equals the limit of the local cumulant Um^^Q-j^K^ if and only ifk is binary, i.e. 



Kk is a square-free cumulant. 

Proof. First, let k E {0, 1}^ be binary and vr = {(Af, )i<j<|7r|} be a partition of the 
lattice corresponding to k. One can show that r(e, A;) = Hjli ^(^; ^^m^)- With that 

A/jgTT 

Now take limits as e — )■ to obtain lim^^Q^^:^^K^ = k\. 

Conversely, suppose k is not binary. Express as a linear combination of 
local moments. Consider the degenerate partition tt, which holds only one mul- 
tiset M with multiplicity vm = k. The quantity associated with vr converges to 
fxlo^ for some constant c in R. k not being binary, this cannot be a differential 
moment, which are proportional to D"fx{^) for some binary a. Differential cu- 
mulants are linear combinations of differential moment products only. Hence 
does not converge to a differential cumulant. □ 
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Of particular interest to us are differential cumulants which vanish every- 
where. We refer to them as zero-cumulants. Writing g = log fx, we shall usually 
write D°'g = to denote the zero-cumulant associated with a in the understanding 
that this holds for all x. 

The next section shows that sets of zero cumulants are isomorphic to condi- 
tional independence statements. As a consequence of lemma [T] zero-cumulants 
are invariant under diagonal transformations of the random vector X. In particu- 
lar, they are not affected by the probability integral transformation and hence any 
result below holds also true for the copula density of X. 



3 Independence and conditional independence 

From now on, we shall assume that fx is strictly positive everywhere. Sets of 
zero-cumulants are equivalent to conditional and unconditional dependency struc- 
tures. 

Proposition 1 (Independence in the bivariate case). Let X in M^. Then Xi i 
X2 <^==^ = for all X in M^. 

Proof. 

OXIOX2 

for some functions /ii , /12 : M — t- M. □ 

In the multivariate case, we can express conditional independence of any pair 
given the remaining variables through square free differential cumulants. 

Proposition 2 (Conditional independence of two random variables). Let X in W. 
Then 

Xi ± Xj\X^ij kI = for all x in W, 

where 

^-ij '■= i^ii •••) Xj+i, Xj_i, Xj^i, Xp) 
andk = ei + ej, (i, j) G {1, .... p}^ i j. 

Proof. By analogy with the bivariate case. □ 

Setting several square-free differential cumulants to zero simultaneously al- 
lows us to express conditional independence statements. 



10 



Proposition 3 (Multivariate conditional independence). Given three index sets 
/, J, K which partition {1, p}, let S = {ei + Cj^i E /, j G J}. Then 

Xj X Xj\Xk Kfc = Ofor allk e S and for all x in W. 

Proof. From proposition |2] it is clear, that this is equivalent to the conditional 
independence statement 

Xi ± Xj\Xk Xi X Xj\X^ij for all [i,]) E I x J. 

Sufficiency (^) and necessity (-^) are semi-graphoid and graphoid axioms re- 
ferred to as decomposition and intersection res pectively. Both hold true f or strictly 
positive conditional densities (see for instance ICozman & Walleyl |2005|) . □ 

Pairwise conditional independence of all pairs is equivalent to independence. 

Theorem 4 (Pairwise conditional independence if and only if independence). The 

randomvariables Xi, X„ are independent if and only ifKe^^^j = for all {i, j) G 

Proof. Sufficiency follows from differentiation of the log-density. Necessity can 
be proved by induction on the number of variables n. The statement is true for 
n = 2 by proposition [U Let the statement be true for n and let the ("^^) differen- 
tial cumulants Ke^+e^ vanish, where Cj and ej are unit vectors in M"+^ Consider 
Kei+e2 = 0- Integration with respect to Xi and X2 yields 

/x,...,x„+i(xi,...,x„+i) = e^^(--)+'^^(-^) (6) 

for some functions hi : M" — > M and /i2 : — > M. Now integrate again with 
respect to xi to obtain 



The left hand side is an n-dimensional marginal density which factorises into n 
marginals by induction assumption: = H^i^ This allows us to 

conclude that /ii(x_i) can be split into a sum of two functions, gi : — )■ M 
and g2 '■ M — > M, where the latter is a function of X2 only, i.e. hi{x^i) = 
(7i(x_i2) + g2{x2)- Considering ^ again we see that the density /xi,...,x„+i fac- 
torises 

J X\,...,Xn+i\-^l^ — *^ 

Hence X2 X X_2 and the density of X_2 factorises by induction assumption. □ 
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4 Hierarchical models 



The analysis of the last section makes clear that setting certain mixed two-way 
partial derivatives of g{x) = \ogfxix) equal to zero, is equivalent to indepen- 
dence or conditional independence statements. We can go further and define a 
generalized hierarchical model using the same process. 

The basic structure of a hierarchical model can be define via a simplicial com- 
plex. Thus let A/" = {1, . . . , p} be the vertex set representing the random variables 
Xi, ...,Xp. A collection S of index sets J C J\f is a simplicial complex if it is 
closed under taking subsets, i.e. if J in 5 and K C J then K inS. 

Definition 5. Given a simplicial complex S over an index set A/" = {1, . . . ,p} 
and an absolutely continuous random vector X a hierarchical model for the joint 
distribution function fx{x) takes the form: 

fx{x) = exp <j ^ hj{xj) 

where hj : M"' — > M and xj in M"' is the canonical projection of x in onto the 
subspace associated with the index set J. 

This is equivalent to a quasi-additive model for g{x) = X]je5 ^ji^j)^ 
we also refer to this model for g(x) as being hierarchical. It is clear that we may 
write the model over the maximal cliques only, nam ely simplexes w hich are not 



contained in a larger simplex. In the terminology of Lauritzenl (|1996i) we require 



fx be positive and factorise according to S for it to be a hierarchical model with 
respect to S. 

Associated to an index set A^ C A/" is a differential operator D^, where k = 
J2ieK ^ {0' holds ones for every member of K and zeros otherwise. In the 
following, we overload the differential operator by allowing it to be superscripted 
by a set or by a vector. Thus, for an index set K we set := and similarly 



I. returns the differential cumulant /t^, when applied to g(x) 



Example 5. Let K = {2, 4, 6}. We obtain k = (0, 1, 0, 1, 0, 1) and 

^3 

D^g{x) = kI = kI = g{x). 

OX20XiOXQ 

We collect the results of the last section into a comprehensive statement. First, 
we define the complementary complex to a simplicial complex S on M. 
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Definition 6. Given a simplicial complex S on an index set J\f we define the 
complementary complex as the collection S of every index set K which is not a 
member of S. 

Note immediately that S is closed under unions, i.e. K, K' inS =^ K U K' E 
S. It is a main point of this paper that there is a duality between setting collections 
of mixed differential cumulants equal to zero and a general hierarchical model: 

Theorem 5. Given a simplicial complex S on an index set M, a model g is hierar- 
chical, based on S if and only if all differential cumulants on the complementary 
complex vanish everywhere, that is 

'^K — fa^ ^ and for all K in S. 

Proof. First, let g be hierarchical with respect to S, that is (7 is a log-density with 
representation g{x) = J2jms ^ji^j)- Then, for K in S, the associated differen- 
tial operator D^^ annihilates any term hj in g, since K ^ J for any J in 5". 

Conversely, suppose = for all x in and for all K in S. Then, by 
proposition 121 fx is pairwise Markov with respect to S and hence factorises over 
maximal cliques of S by the Hammersley-Clifford theorem. The reader is re- 



ferred to ILauritzenl ( 1996 ) for a detailed discussion of factorization and Markov 



properties. □ 



5 The duality with monomial ideals 

The growing area of algebraic statistics makes use of computational commutative 
algebra particularly for discrete probability model, notably Poisson and multino- 
mial log-linear models. Work connecting the algebraic methods to continuous 
probability models is sparser althoug h considera ble process has been made in the 



Gaussian case. For an overview see iDrton et al. Our link to the algebra is 

via monomial ideals. 

A monomial a product of the form x" = Y[^=i ^ ' where 

a in W. A monomial ideal / is a subset of a polynomial ring k[xi, ...,Xp] such that 
any m E I can be written as a finite polynomial combination m = X]fcGi<: ^kx"'', 
where hk E k[xi, ...,Xp] and E W for all k E K. We write I =< > 
to express that / is generated by the family of monomials {x"'^ )keK- 

The full set M of monomials contained in monomial ideal / has the hierarchi- 
cal structure: 

x'^ eM^ E M, (7) 
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for any index set 7 G W. A monomial ideal is square-free if its generators 
{x°"')i<k<K are square free, i.e. G {0, 1}^ for all 1 < A; < 

The following discussion shows that there is complete duality between the 
structure of square-free monomial ideals and hierarchical models. Associated with 
a simplicial complex S is its Stanley-Reisner ideal Is- This is the ideal generated 
by all square-free monomial in the complementary complex S. For a face K E S 
let mxix) := IlfcGii: denote the associated square-free monomial. Then 

Is =< {mK)K€S > ■ 

The second step, which is a main point of the paper, is to associate the dif- 
ferential operator with the monomial mK{x). We need only confirm that the 
hierarchical structure implied by (|7} is consistent with differential conditions of 
Theorem |5l 

Without loss of generality include all differential operators which are obtained 
by continued differentiation. Then, (|7]) is mapped exactly to 

D'^gix) = 0, for all xeW^ ^ D"+^c/(a;) = 0, for all x eW 

simply by continued differentiation. This bijective mapping from monomial ide- 
als into differential operators, is sometimes referred to as a "polarity" and within 
differ ential ideal theory has its origins in "Seidenberg's differential nuUstellen- 



satz" (iSeidenbergl 119561) . It allows us to map properties of hierarchical models in 
statistics to monomial ideal properties and vice versa. 

One of the main conditions discussed in the theory of hierarchical models in 
statistics is the decompos ability of a joint density function into a product of certain 
marginal probabilities. Simple conditional probability is a canonical case. Thus 
with p = 3 the conditional independence X X2 1 X3 is represented by the graph 
1 — 3 — 2. In this case the graph has the model simplicial complex: 5 = {13, 23}, 
where, again, we write S in terms of its maximal cliques. The Stanley-Reisner 
ideal is Is =< X1X2 >■ 

There is a factorization: 

fx^,X3{xi,X3)fx^,X;{x2, X3) 



fxx,X2,Xa{xi, X2, X3) 



fx^ixs) 



Decomposable graphical models, discussed below, are a generalization of this 
simple case. There are other cases, however, where one or more factorizations are 
associated with the same simplicial complex. An example is the 4-cycle: S = 
{12, 23, 34, 41} with The Stanley-Reisner ideal Is =< X1X3, X2X4, >. Although 
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this ideal is rather simple from an algebraic point of view the 4-cycle from a 
statistical point of view is rather complex. By considering special ideals we obtain 
general classes of models, in a subsection 15 .2 1 

Another issue is that the structure of S may suggest factorizations even when 
they are problematical. Perhaps the first such case is the 3-cycle: S = {12, 13, 23}. 
The Stanley-Reisner ideal is =< xiX2X-i >. The maximal clique log-density 
representation has no three-way interaction: 

g{Xi, X2, Xs) = hi2{Xi,X2) + hi3{xi, Xg) + hu{Xi, Xi). 

This might suggest the factorization 

JXi,X2,X3[Xl,X2,X3) — — - — — — - — — — - — (8; 

A factorization of this kin d is the continuo us analogue to a perfect three-dimensional 



table in the discrete case (|DarrochL Il962|) . However, except when Xi,X2, are 



independent we have not been able to provide a standard density for which ([8]) 
holds. 



5.1 Decomposability and marginality 

Our use of the index set notation makes its straightforward to define decompos- 
ability. 

Definition 7. Let J\f = {1, . . . ,p} he the vertex set of a graph Q and /, J vertex 
sets such that lU J = J\f. Then Q is decomposable if and only if / fl J is complete 
and / forms a maximal clique or the subgraph based on I is decomposable and 
similarly for J. 



Under this condition the corresponding hierarchical model has a factorization 

fvixv) 



where the numerator on the right hand side corresponds to cliques and the denom- 
inator to separators which arise in the continued factorization under the definition. 

It is important to realize that in order to proceed with the factorization at each 
stage a marginalisation step is required. Consider the simple case based on the 



15 



2 4 2,4 




Figure 2: Factorization and marginalisation of a hierarchical model. 

simplicial complex S = {123, 234, 345}. One choice of factorization at first stage 
is (with simplified notation): 

r _ /123/2345 
/12345 — 7 

and we continue the factorization to give 

r _ /123 7234/345 

J12345 — — 7 — 7 • 

J23/34 

The process of marginalisation is shown in Figure |2] At any stage, we may choose 
to marginalise with respect to any variable that is member of just a single clique. 
In the first step these are Xi and X5 and suppose we chose to single out Xi. Once 
fx has been integrated with respect to Xi, the marginal model for X2, ...jX^ is 
obtained. The removal of a the clique 123 leads to X2 being exposed and we may 
continue with X2 or X^ etc. 

The Stanley-Reisner ideal Is =< X1X4, xix^, X2X5 > is an ideal in k[xi,X2, X3, X4, x^]. 
The factorization of 72345 is, however, mapped into the monomial ideal < X2Xrj > 
which is an ideal in k[x2, 0:3, X4, x^]. A marginalisation has allowed us to drop 
from five dimensions to four. This is clear from the exponential expression of the 
model: 

/12345 = exp {/il23(2;i,a;2,a:3) + h234{x2,X3,X4) + h345{xs,X4,X5)} . 

Integrating with respect to xi we obtain a hierarchical model for the marginal 
joint distribution of (X2, X3, X^, X^). This marginalisation is possible because 
xi appears only in the single clique {1, 2, 3}. 
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We have exposed an interesting relationship between the statistical and algebra 
formulation: in order to reduce the dimensionality and obtain the Stanley-Reisner 
ideal for a reduced set of variables, we must first perform a marginalisation, which 
is a non-algebraic operation, at least, not in general a finite dimensional operation. 
We capture this in the following Lemma: 

Lemma 2. Whenever a simplicial complex of hierarchical model has a subset 
of vertexes which form a facet of a unique maximal clique (simplex) then the 
marginal model obtained by deleting this facet ( and its connections ) is valid. 
Moreover the monomial ideal representation is obtained by deleting any gen- 
erators containing the corresponding variables and is in the ring without these 
variables. 

Proof. This follows the lines of the example. If J is the subset of vertexes and 
K, with J C K, is the unique maximal clique, then in the exponential expression 
for the density there will be a unique term exp(^x(a^A')) in which xj appears. 
Integrating with respect to xj to obtain the marginal distribution for Xv\j gives 
the reduced model. The monomial ideal representation follows accordingly. □ 



5.2 Artinian closure and polynomial exponential models 

The terms hj{xj) which appear in the hierarchical models have not been given 
any special form. In fact it is a main point of this paper that this is not required to 
give the monomial ideal equivalence. We note, again, that we always use square- 
free monomial ideals. 

Certain classes of hierarchical models can, however, be obtained by imposing 
further differential conditions. The following lemma shows that the log-density is 
polynomial if we impose univariate derivative restriction. 

Lemma 3. If in addition to the differential conditions in Theorem \5\ we impose 
conditions of the form 

Qrii 

-—-g{x) = 0, for alll <i <pandneW (9) 

the h-functions in the corresponding hierarchical model are polynomials, in which 
the degree ofxi does not exceed — 1, for all 1 < i < p. 

Proof. Repeated integration with respect to Xi shows that g is indeed a polynomial 
in Xi of degree less than n^, when the other variable are fixed. Since this holds for 
all 1 < i < p the result follows. □ 
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The simultaneous inclusions of derivative operators with respect to one inde- 
terminate in do) constitutes an Artinian closure of the differential version of the 
Stanley-Reisner ideal Ig. 

Example 6 (BEC density). Suppose X is bivariate and we impose the symmetric 
Artinian closure conditions 

■g^g{xi, X2) = 0, for 1 = 1,2. 

Then integration yields 

g{xi, X2) = xihi{x2) + h2{x2) 



and 



g{xi, X2) = X2h^{xi) + K{xi). 

A comparison of these functionals identifies hi{x2) = 03X2 + ai, h2{x2) = 
ao + a2X2, h^i^xi) = a^xi + 02, ^4(2^1) = o-ix + oq, for some G M for all 
1 < ^ < 4, so that g{xi, X2) can be written as 

g{xi, X2) = Go + aixi + 02X2 + 03X1X2. (10) 

It can be shown that Xi is distributed exponentially conditional on X2 = X2 
for all X2 > and vice versa. A distributions with that property is called bi- 
variate exponential conditionals (BEC) distribution. BEC distributions are com- 
pletely described by q i n the sense that any BEC density is of the form (flOl) 



(|Amold & Straussl Il988|) . In particular, the independence case is included, if we 

nal restriction 

g{xi,X2) = 0. 



force 03 = by imposing the additional restriction 



9x1X2' 

This confirms Proposition [T]for this particular example. 

The previous example extends readily into higher dimension. We call a dis- 
tribution multivariate exponential conditionals (MEC) distribution, if Xj is dis- 
tributed exponentially conditional on Xj = Xj for all 1 < i,j < p,i ^ j. We 
capture the extension to the p-dimensional case in the following lemma: 
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Lemma 4 (MEC distributions and Artinian closure). The following statements are 
equivalent: 



1. A distribution belongs to the class of MEC distributions 

2. g is multi-linear, i.e there exist 2^ indices G M such that g = XIsgC ^^x^^, 
where ( = {0, 1}^ denotes the set ofp dimensional binary vectors 

3. -^g{x) = 0, for all 1 < i < p. 



Proof. For a proof of (1) <^=^ (2) see lAmold & StraussI (|l988() . The proof of 



(2) <^=^ (3) follows the lines of the example. □ 
Another case of considerable importance is the Gaussian distribution. Here 

Kes 

and the maximal cliques are of degree two. The latter condition is partly obtained 
with an Artinian closure with rij = 3, i = 1, . . . ,p. However, more is required. 
We can guess, from the fact that for a normal distribution all (ordinary) cumulants 
of degree three and above are zero, that if we impose all degree-three differential 
cumulant to be zero we obtain polynomial terms of maximum degree 2. This is, 
in fact the correct set of conditions to make the models terms of degree at most 
two. In the a-notation the conditions are 



D'^g = 0, for all aeW with ||a||^ = 3, 

which includes the Artinian closure conditions. The corresponding ideal is gen- 
erated by all polynomials of degree three. For a non-singular multivariate Gaus- 
sian, we, of course, require non-negative definiteness of the degree-two part of the 
model, considered as a quadratic form. 

The hierarchical model is given by additional restrictions which are equivalent 
to removing certain terms of the form XiXj, i ^ j. This is the same as setting the 
corresponding {ij} -th entry in the inverse co variance matrix (influence matrix) 
equal to zero. The removed XiXj generate the Stanley-Reisner ideal so that the 
zero structure of the influence matrix completely determines the ideal. 
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5.3 Ideal-generated models 



The duality between monomial ideals and hierarchical models encourages the in- 
vestigation of the properties of hierarchical models for different types of ideals. 
There are some important properties and features of monomial ideals which may 
be linked to the corresponding hierarchical models and we mention just a few here 
in an attempt to introduce a larger research programme. 

We begin with the sub- class of decomp osable models. It is well know from 
the statistical literature (see lLauritzenl Il996|) that the decompos ability property of 
the model based on a simplicial complex S is equivalent to the chordal property: 
there is no chord-less 4-cycle. Remarkably, the latter is equivalent to a property 
of the Stanley-Reisner ideal Is, namely: that the minimal fre e resolut i on of J.s be 
linear(see bel ow for a brief exp l anatio n) . This is a result of iFroberg ( |l988l) . see 
also [Dochte rmann & EngstromI ( 2009 ). Petrovic & Stokes ( 2010 ) adapt a result 
of lOeiger et al.l (|2006|) to show that Ig, in this case, is generated in degree 2, that 
is all its generators have degree 2. 

Theorem 6. A decomposable graphical model Is has a "2-linear" resolution. 

The term linear refers to the structure of the minimal free resolution of Is- 
In this resolution there are monomial maps between the stages of the resolu- 
tion sequence. Linear means that these maps are linear. As a simple example 
consider again the simplicial complex S = {123, 234, 345} with Stanley-Reisner 
ideal Is =< Xix^^ Xix^^ X2X5 > The minimal free resolution of Is is given by: 



[XiX4,XiX5,X2X5\ 



0, 



and one sees that the map is linear. By contrast, the 4-cycle is generated in degree 
2, but is not linear: 

~xiX3 J 

[xiX3,X2X4] — i- 0, 

giving a non-linear map. 

A special case of 2-linear resolutions are Ferrer ideals. A Ferrer ideal Is is one 
in which the degree-two linear generators can be placed in a table with an inverse 
stair-case. Such staircases arose historically in the study of integer partitions. As 
an example take the Stanley-Reisner ideal 

Is =< XiXq, XiXj, XiXs, X2XQ, X2X7, XsXq, X3X7, X4XQ, X^Xq >C k[xi, Xg]. 
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The Ferrer table is: 





6 


7 8 9 


1 




XiXf XiXs 


2 


X2Xq 


X2XJ 


3 


X3XQ 




4 


X/^X% 




5 


X5XQ 





Considering the non-empty cells as given by edges this correspon ds to a special 
type of bi-partite graph between nodes {1, 2, 3, 4, 5} and {6, 7, 8, 9}.'Corso & Nagel| 
I2OO9) show that, among the class of bi-partite graphs, Ferrer ideals are indeed 
uniquely characterized as having a 2-linear minimal free resolution. 

It is straightforward to show that the corresponding hierarchical model is de- 
composable by exhibiting the decomposition given by Lemma [21 First take two 
simplices based on the variables defining, respectively, the rows and columns. In 
the example these are Ji = 12345, J2 = 6789. Then join all nodes corresponding 
to the complement of the Ferrer diagram to give: 





6 


7 


8 


9 


1 








XiXg 


2 






X2XS 


X2Xg 


3 






X3X8 




4 








X^Xg 


5 






X5X8 


X5X9 



The maximal cliques are easily seen to be given by a simple rule on this comple- 
mentary table. For each non-empty row take the variable which defines that row 
together with every other variables for nonempty columns in that row and all the 
variables for the rows below that row. In this example we find, working down the 
rows, that the maximal cliques are: 

123459, 234589, 34589, 45789, 5789. 

Note how to this example we can apply Lemma [21 by successively stripping off 
variables in the order: xi, X2, 2:3, X4, X5. The separators are 23459, 34589, 4589, 5789, 789. 
The rule provides a proof of the following. 

Theorem 7. Hierarchical models generated by Ferrer ideals are decomposable. 

As another illustration of the duality between monomial ideals an d conditional 

independence structures, we next consider two terminal networks. In iSaenz de Cabezon & Wynn 
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2 



Input 




3 



Figure 3: A two-terminal network. 



(l2010 h the authors apply the theory and construction of minimal free resolutions 
to the theory of reliability. One sub-class of these is to networks, in the classi- 
cal sense of network reliability. Consider a connected graph G = {E, V), with 
two identified nodes called input and output, respectively. A cut is a set of edges, 
which if removed from the graph disconnects input and output. A path is con- 
nected set of edges from input to output. A minimal cut is a cut for which no 
proper subset is a cut and minimal path is a path for which no proper subset is a 
path. 

As a simple example consider the network depicted in Figure |3] with input = 1 
and output = 4 and edges: 

ei = 1 - 2, 62 = 2 - 4, 63 = 2 - 3, 64 = 1 - 3, 65 = 3 - 4. 

The minimal cuts are {61, 64}, {62,65}, {61,63,65}, {62, 63, 64}. If we associate 
a variable Xi with each edge Cj then the minimal cuts generate an ideal. In this 
example we write 

The maximal cliques of S for the corresponding model simplicial complex S are 

{15,24,123,345} 

We could, on the other hand define Is* as being the collection of all paths on the 
network. In this case the Is* is generated by the minimal paths giving: 

< X1X2, X4^X5, XiX^X^, X2X3Xi >, 

and S* consists of the complements of the cuts and has maximal cliques {15, 24, 134, 235}. 
There is a duality between cuts and path models for two-terminal networks: 
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Lemma 5. The model simplicial complex S based on the cut ideal Ig of a two 
terminal network is formed from the complement of all paths on the network. Con- 
versely, the model simplicial complex S* based on the path ideal Is*, is formed 
from the complement of all cuts. Moreover: (S*)* = S. 

For example, the term 15 of 5 is the complement of the (non-minimal) path 
234 and the term 14 in S* is the complement the (non-minimal) cut 235. 

This duality is a special ex ample of Alexander duality and we omit the proof, 



see 



Miller & Sturmfelsl (|2005l) . Proposition 1.37. The general result says that for 
a square-free S, if we define S* as the complement of all non-faces of S, then 

(S*)* = S. 

It will have been noticed that for this network S and S* are self-dual in 
the sense that the two simplicial complexes have the same structure and only 
differ in the labelling of the vertexes. Both models have two separate condi- 
tional independence properties. Thus for S we have Xi X X4|(X2, X3, X5) and 
X2±X,\{X^.X,,X,). 



5.4 Geometric constructions 



Simplicial complexes are at the heart of algebraic topology and it is natural to 
look in that field for classes of simplicial complexes whose abstract version may 
be used to support hierarchical models. We mention brief ly one class here aris- 
ing fro m the fast-growing area of persistent homology, see Ede lsbru nner & Harer 
(|2010l) . This class has already been used by lLunagomez et al . (2009) to construct 
graphical models using so-called Alpha complexes. We give the construction now. 
It is to based on the cover p rovided by a union of balls in R'^, a construction 
used by Edelsbrunner* (* 1 9951) in the context of corn putational geometry and in 
Naiman & Wynn (.1992) and lNaiman & WynnI (|l997|) to study Bonferroni bounds 
in statistics. 

Thus, let zi, Zp he p points in i?*^ and define the solid balls with radius r 
centered at the points: 

Bi{r) = {z : \\x - Zi\ \ < r} , i = 1, . . . , p 

The nerve of the cover represented by the union of balls is the simplicial complex 
S derived form the intersections of the balls, and is called the Alpha complex. It 
consists of exactly all index sets J for which P[i^jBi{r) 7^ 0. 

When the radius, r, is small S consists of unconnected vertexes and the hier- 
archical model gives independence of the Xi, Xp. As r — )• 00 there is a value 
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of r at and beyond which n^^^Bi{r) ^ and S consists of a single complete 
clique. In that case we have a full hierarchical model. Between these two cases, 
and depending on the position of the Zi and the value of r we obtain a rich classes 
of simplicial complexes and hence hierarchica l models. Some of thes e will be 
decomposable and we refer to the discussion in lLunagomez et al.l (|2009h . 

It is the study of the topology of the nerve as r changes, and in particular the 
behavior of its Betti numbers, which drives the area of persistent homology. A 
important theoretical and computational result is that this topology is also that of 
the reduced simplicial complex based on the Delauney complex associated with 
their Voronoi diagram. That is to say, for fixed r it is enough, from a topological 
(homotopy) viewpoint, to use the sub-complex of the D elauney compl ex S ~ con- 
tained in S. The theory derives from classical results of lBorsukl(|l948l) and lLeray 
(|1945|) . One beautiful fa ct is that the Delauney dual complex based on the furthest 
point Voronoi diagram ( Qkabe et al.l |2000), is obtained by the Alexander duality 
mentioned in the last subsection. 

In this paper we have concentrated on the correspondence between S and its 
Stanley-Reisner ideal Is- The use of Is is not always explicit i n persistent homol 



ogy bu t is implicit in the underlying homology theory: see ISaenz de Cabezon 
(l2008h :br a thorough investigation, including algorithms. Also, although the 
topology of S and its reduced Delaunay version is the same, if their actual 
structure is different they lead to different hierarchical models. One can also use 
non-Euclidean metrics to define the cover and, indeed, work in different spaces 
and with other kinds of cover. Notwithstanding these many interesting technical 
issues the use of geometric constructions to define interesting classes of hierarchi- 
cal model promises to be very fruitful. 



6 Conclusion 

There are many features and properties of monomial ideals which remain to be ex- 
ploited in statistics via the isomorphism discussed in the last section. We should 
mention minimal free resolutions, the closely related Hilbert series, Betti num- 
bers (including graded and multi-graded versions) and Alexander duality. It is 
pleasing that in the general case the development of the last section only requires 
consideration of square-free ideals, whose theory is a little easier than the full 
polynomial case. Fast algorithms are available for symbolic operations covering 
all these areas so that as further links are made they can be implemented. We have 
not covered statistical analysis in this paper. Further work is in progress to fit and 
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test the zero-cumulant conditions D'^g — using, for example, kernel methods. 
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